Gradient Diversity Empowers Distributed Learning

نویسندگان

Dong Yin

Ashwin Pananjady

Maximilian Lam

Dimitris S. Papailiopoulos

Kannan Ramchandran

Peter Bartlett

چکیده

It has been experimentally observed that distributed implementations of mini-batch stochastic gradient descent (SGD) algorithms exhibit speedup saturation and decaying generalization ability beyond a particular batch-size. In this work, we present an analysis hinting that high similarity between concurrently processed gradients may be a cause of this performance degradation. We introduce the notion of gradient diversity that measures the dissimilarity between concurrent gradient updates, and show its key role in the performance of mini-batch SGD. We prove that on problems with high gradient diversity, mini-batch SGD is amenable to better speedups, while maintaining the generalization performance of serial (one sample) SGD. We further establish lower bounds on convergence where mini-batch SGD slows down beyond a particular batch-size, solely due to the lack of gradient diversity. We provide experimental evidence indicating the key role of gradient diversity in distributed learning, and discuss how heuristics like dropout, Langevin dynamics, and quantization can improve it.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The existence of a unimodal or monotonic pattern in species richness and diversity along an elevational gradient: a case study in Heydari Wildlife Refuge, NE Iran

This article presents an analysis of plant species richness and diversity, concerning some climatic variables along a 1500-m elevation gradient on the Binalood Mountain in Heydari Wildlife Refuge (HWR), northeastern Iran. Two hundred and thirteen nested-sampling quadrats were established and the abundance of the plants was recorded. Vegetation sampling was carried out from 2014 till 2016, foll...

متن کامل

Brief Announcement: Byzantine-Tolerant Machine Learning

We report on Krum, the rst provably Byzantine-tolerant aggregation rule for distributed Stochastic Gradient Descent (SGD). Krum guarantees the convergence of SGD even in a distributed setting where (asymptotically) up to half of the workers can be malicious adversaries trying to attack the learning system.

متن کامل

Near-Optimal Straggler Mitigation for Distributed Gradient Methods

Modern learning algorithms use gradient descent updates to train inferential models that best explain data. Scaling these approaches to massive data sizes requires proper distributed gradient descent schemes where distributed worker nodes compute partial gradients based on their partial and local data sets, and send the results to a master node where all the computations are aggregated into a f...

متن کامل

Representative Selection with Structured Sparsity

We propose a novel formulation to find representatives in data samples via learning with structured sparsity. To find representatives with both diversity and representativeness, we formulate the problem as a structurally-regularized learning where the objective function consists of a reconstruction error and three structured regularizers: (1) group sparsity regularizer, (2) diversity regularize...

متن کامل

Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates

In large-scale distributed learning, security issues have become increasingly important. Particularly in a decentralized environment, some computing units may behave abnormally, or even exhibit Byzantine failures—arbitrary and potentially adversarial behavior. In this paper, we develop distributed learning algorithms that are provably robust against such failures, with a focus on achieving opti...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1706.05699 شماره

صفحات -

تاریخ انتشار 2017

Gradient Diversity Empowers Distributed Learning

نویسندگان

چکیده

منابع مشابه

The existence of a unimodal or monotonic pattern in species richness and diversity along an elevational gradient: a case study in Heydari Wildlife Refuge, NE Iran

Brief Announcement: Byzantine-Tolerant Machine Learning

Near-Optimal Straggler Mitigation for Distributed Gradient Methods

Representative Selection with Structured Sparsity

Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates

عنوان ژورنال:

اشتراک گذاری